inductive learning problem
Review for NeurIPS paper: Meta-trained agents implement Bayes-optimal agents
Weaknesses: Perhaps the rhetoric of the paper is a little overheated, with lots of ITALICS and claims of novelty and significance that exceed the actual findings (see below). The basic ideas do not seem all that "striking". Yes, of course if we have a parameterized policy family that includes the optimal policy by design, and we train it with feedback such that the optimal policy is the one that maximizes the feedback, then it works. Note that the "target distribution" can be thought of as an initial stochastic step in a single (PO)MDP that samples the problem parameters, so the process is learning a policy for that POMDP. Where [10] (for which the authors are of course not necessarily responsible) says "Essentially, memory-based meta-learning translates the hard problem of probabilistic sequential inference into a regression problem," this is exactly what Monte Carlo RL does.